fix: Hardcode Legacy behavior to True to resolve warning. #446

Luka-D · 2025-01-22T22:05:50Z

Description of the change

Proposing the change to set Legacy=True in the AutoTokenizer. This will continue the same functionality of sft_trainer.py that we currently have, but it will remove this warning from appearing:

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

More discussion can be found on the nature of the legacy behavior here: huggingface/transformers#24565.

Even when Legacy is not explicitly set, it is by default set to True by the tokenizer. Thus, this change will not change the current functionality.

I have done some testing on the impact of Legacy behavior when tuning with sft_triner and have included my results below.

Related issue number

Resolving a warning raised in Issue #1205.

How to verify the PR

Run tuning locally and verify that the warning message from above does not appear anymore.

Was the PR tested

I created an image for Legacy = True and one for Legacy = False and tested it using the Travis CI flow. The changes were tested on llama3 and granite, using both LoRA tuning and Fine Tuning. I have graphed the results here:

The F1 micro score was identical for both models when using Fine Tuning. When using LoRA tuning, both models showed a small improvement in F1 micro score when Legacy was set to True. The difference is very small however and might subject to a margin of error while testing. We concluded that the results are pretty much the same regardless of what Legacy was set to.

We also wondered whether the setting would change the EOS and BOS tokens, so I ran tuning locally and compared the tokenized outputs. The outputs were the same for both settings, at 1 epoch and at 5 epochs. I have included the tokenized output files below for comparison.

Legacy True 1 Epoch.txt
Legacy False 1 Epoch.txt

In conclusion, we determined that the impact of the Legacy setting on the tokenizer was negligible. We decided to keep the functionality the same as it is, but to hardcode Legacy=True to avoid the warning appearing.

I have added >=1 unit test(s) for every new method I have added.
I have ensured all unit tests pass

Testing Legacy behavior set to True Signed-off-by: Luka Dojcinovic <[email protected]>

Signed-off-by: Luka Dojcinovic <[email protected]>

We've decided to maintain the previous legacy behavior, coding it here to avoid that warning. Signed-off-by: Luka Dojcinovic <[email protected]>

github-actions · 2025-01-22T22:06:01Z

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

aluu317

We had a discussion regarding keeping the same legacy tokenizer. Explicitly setting the value resolves warning. Looks good to me!

willmj

LGTM

Luka-D added 4 commits January 13, 2025 11:54

Set Legacy to True

50c7612

Testing Legacy behavior set to True Signed-off-by: Luka Dojcinovic <[email protected]>

Changed legacy to False

4071a38

Signed-off-by: Luka Dojcinovic <[email protected]>

Set Legacy = True

5ef6a0a

We've decided to maintain the previous legacy behavior, coding it here to avoid that warning. Signed-off-by: Luka Dojcinovic <[email protected]>

Merge branch 'foundation-model-stack:main' into fix-legacy-behavior

b86d0cd

Luka-D requested review from anhuong, Ssukriti, aluu317, fabianlim and kmehant as code owners January 22, 2025 22:05

Luka-D changed the title ~~Fix: Hardcode Legacy behavior to True to resolve warning.~~ fix: Hardcode Legacy behavior to True to resolve warning. Jan 22, 2025

github-actions bot added the fix label Jan 22, 2025

aluu317 approved these changes Jan 23, 2025

View reviewed changes

willmj approved these changes Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Hardcode Legacy behavior to True to resolve warning. #446

fix: Hardcode Legacy behavior to True to resolve warning. #446

Luka-D commented Jan 22, 2025

github-actions bot commented Jan 22, 2025

aluu317 left a comment

willmj left a comment

fix: Hardcode Legacy behavior to True to resolve warning. #446

Are you sure you want to change the base?

fix: Hardcode Legacy behavior to True to resolve warning. #446

Conversation

Luka-D commented Jan 22, 2025

Description of the change

Related issue number

How to verify the PR

Was the PR tested

github-actions bot commented Jan 22, 2025

aluu317 left a comment

Choose a reason for hiding this comment

willmj left a comment

Choose a reason for hiding this comment